3D Graphic Engine Essentials
Volume Number: 15
Issue Number: 6
Column Tag: Games Programming
3D Graphics Engine Essentials
by Eric Lengyel
A crash course on topics that every 3D engine
designer needs to master
This year marks the beginning of an era in which all 3D computer games are requiring
hardware acceleration. Using a 3D accelerator removes the burden of writing a
software-based triangle rasterizer, but designing a high-quality game engine still
requires a lot of work. This article discusses how to perform and efficiently
implement calculations that today's hardware does not handle such as coordinate
transformations, triangle clipping, and visibility determination. I am going to assume
that the reader is familiar with the fundamentals of vector and matrix mathematics
that I use heavily in this article. The bibliography lists a couple of sources, including
a recent MacTech article, which can provide a good introduction to this material.
The intention is to present all of the material in this article in a platform-independent
manner. It is up to the reader to implement any code which is specific to a particular
3D acceleration API such as OpenGL or QuickDraw3D RAVE. The code accompanying this
article is general in nature and does not depend on any specific API. The structures that
we will use are shown in Listing 1 and include generic vector and matrix containers as
well as structures which encapsulate vertices and triangles. All of the functions which
are described in this article are implemented as methods of the MyEngine class shown
in Listing 2. Details about these structures and functions are given in the sections that
follow.
Coordinate Transformations
At the lowest level, the 3D hardware receives a list of vertices and a list of triangles to
draw on the screen. Each vertex carries (x, y) screen coordinates, a z depth value, a
color, an alpha value, and (u, v) texture map coordinates. Each triangle simply
indicates which three members of the vertex list serve as its corners. (See the Vertex
and Triangle structures shown in Listing 1.) The problem at hand is how to calculate
where on the screen a given point in 3D space should appear for a given camera
position and orientation.
Every 3D engine needs to deal with three different coordinate systems which I will
introduce here and discuss in more detail shortly. The first is called world space or
global space. This is the coordinate system in which everything is absolute, and it's the
system in which we specify our camera position and the position of every object in the
world. The second coordinate system is called object space or local space. Each object
has its own space in which the origin corresponds to the position of the object in world
space. The third coordinate system is called camera space. In camera space, the camera
resides at the origin, the x and y coordinate axes are aligned to the coordinate system of
the screen (x points to the right, and y points downward), and the z axis points in the
direction that the camera is facing (and thus the z coordinate in camera space
represents depth). It should be noted here that many 3D graphics systems have the y
axis pointing upward in camera space, but this results in evil things such as
left-handed coordinate systems unless the z axis is also reversed (as is the case with
OpenGL). In any event, it makes life more complicated that it needs to be, so I will
avoid these variants and stick to what I believe to be the more intuitive y-down
system.
In order to obtain screen coordinates for a given 3D point, we must do two things. First
we have to transform the point into camera space, and then we have to project it onto
the viewing plane which represents our screen. Clipping will take place between these
two operations so that we never project points that will not actually participate in the
final rendering of a scene. The transformation and projection of points is handled quite
nicely in theory by using four-dimensional homogeneous coordinates, although in
practice we will not use these coordinates to their fullest extent for efficiency reasons.
Let's begin with a transformation in normal 3D coordinates. Suppose that we had a
scene that contained a single cube. In the cube's object space, it is convenient to place